feat(payments): add MPP core runtime scaffolding#2
Conversation
…rtup check, tests Five improvements to the /api/jobs endpoints: 1. Startup availability check — cron module imported once at class load, endpoints return 501 if unavailable (not 500 per-request import error) 2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be positive int 3. Update field whitelist — only name/schedule/prompt/deliver/skills/ repeat/enabled pass through to cron.jobs.update_job, preventing arbitrary key injection 4. Deduplicated validation — _check_job_id and _check_jobs_available helpers replace repeated boilerplate 5. 32 new tests covering all endpoints, validation, auth, and cron-unavailable cases
…d_id format (NousResearch#2455) Parse thread_id from explicit deliver target (e.g. telegram:-1003724596514:17) and forward it to _send_to_platform and mirror_to_session. Previously _resolve_delivery_target() always set thread_id=None when parsing the platform:chat_id format, breaking cron job delivery to specific Telegram topics. Added tests: - test_explicit_telegram_topic_target_with_thread_id - test_explicit_telegram_chat_id_without_thread_id Also updated CRONJOB_SCHEMA deliver description to document the platform:chat_id:thread_id format. Co-authored-by: Alex Ferrari <alex@thealexferrari.com>
…rtup check, tests (NousResearch#2456) fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Remove the hardcoded Alibaba branch from resolve_runtime_provider() that forced api_mode='anthropic_messages' regardless of the base URL. Alibaba now goes through the generic API-key provider path, which auto-detects the protocol from the URL: - /apps/anthropic → anthropic_messages (via endswith check) - /v1 → chat_completions (default) This fixes Alibaba setup with OpenAI-compatible DashScope endpoints (e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because runtime always forced Anthropic mode even when setup saved a /v1 URL. Based on PR NousResearch#2024 by @kshitijk4poor. Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Cherry-picked from PR NousResearch#2017 by @simpolism. Fixes NousResearch#2011. Discord slash commands in threads were missing thread_id in the SessionSource, causing them to route to the parent channel session. Commands like /usage and /reset returned wrong data or affected the wrong session. Detects discord.Thread channels in _build_slash_event and sets chat_type='thread' with thread_id. Two tests added.
…5d6932ba fix(discord): properly route slash event handling in threads
…kill (NousResearch#2461) * fix: respect DashScope v1 runtime mode for alibaba Remove the hardcoded Alibaba branch from resolve_runtime_provider() that forced api_mode='anthropic_messages' regardless of the base URL. Alibaba now goes through the generic API-key provider path, which auto-detects the protocol from the URL: - /apps/anthropic → anthropic_messages (via endswith check) - /v1 → chat_completions (default) This fixes Alibaba setup with OpenAI-compatible DashScope endpoints (e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because runtime always forced Anthropic mode even when setup saved a /v1 URL. Based on PR NousResearch#2024 by @kshitijk4poor. * docs(skill): add split, merge, search examples to ocr-and-documents skill Adds pymupdf examples for PDF splitting, merging, and text search to the existing ocr-and-documents skill. No new dependencies — pymupdf already covers all three operations natively. --------- Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
… (salvage NousResearch#1981) (NousResearch#2462) * fix: respect DashScope v1 runtime mode for alibaba Remove the hardcoded Alibaba branch from resolve_runtime_provider() that forced api_mode='anthropic_messages' regardless of the base URL. Alibaba now goes through the generic API-key provider path, which auto-detects the protocol from the URL: - /apps/anthropic → anthropic_messages (via endswith check) - /v1 → chat_completions (default) This fixes Alibaba setup with OpenAI-compatible DashScope endpoints (e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because runtime always forced Anthropic mode even when setup saved a /v1 URL. Based on PR NousResearch#2024 by @kshitijk4poor. * docs(skill): add split, merge, search examples to ocr-and-documents skill Adds pymupdf examples for PDF splitting, merging, and text search to the existing ocr-and-documents skill. No new dependencies — pymupdf already covers all three operations natively. * fix: replace all production print() calls with logger in rl_training_tool Replace all bare print() calls in production code paths with proper logger calls. - Add `import logging` and module-level `logger = logging.getLogger(__name__)` - Replace print() in _start_training_run() with logger.info() - Replace print() in _stop_training_run() with logger.info() - Replace print(Warning/Note) calls with logger.warning() and logger.info() Using the logging framework allows log level filtering, proper formatting, and log routing instead of always printing to stdout. --------- Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com> Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
Two fixes: 1. CLI /stop command crashed with 'cannot import name get_registry' — the code imported a non-existent function. Fixed to use the actual process_registry singleton and list_sessions() method. (Reported in NousResearch#2458 by haiyuzhong1980) 2. Streaming media delivery used undefined 'adapter' variable — our PR NousResearch#2382 called _deliver_media_from_response(adapter=adapter) but 'adapter' wasn't guaranteed to be defined in that scope. Fixed to resolve via self.adapters.get(source.platform). (Reported in NousResearch#2424 by 42-evey)
…ery (NousResearch#2463) fix: /stop command crash + UnboundLocalError in streaming media delivery
…summary model Closes NousResearch#2453 The DEFAULT_CONFIG was hardcoding google/gemini-3-flash-preview as the summary_model for context compression. This caused unexpected OpenRouter charges for users who configured a different provider/model, because the compression task would silently fall back to gemini via OpenRouter even when the user's main model was on a different provider. Fix: change summary_model default to empty string. When empty, call_llm() resolves the model through the standard auto-detection chain (auxiliary.compression config -> env vars -> main provider), which correctly uses the user's configured provider and model. Users who want a dedicated cheap model for compression can still explicitly set compression.summary_model in their config.yaml.
…summary model (NousResearch#2464) fix(compression): remove hardcoded gemini-3-flash-preview as default summary model
When sounddevice is installed but libportaudio2 is not present on the system, the OSError was caught together with ImportError and showed a generic 'pip install sounddevice' message that sent users down the wrong path. Split the except clause to give a clear, actionable message for the OSError case, including the correct apt/brew commands to install the system library.
…sResearch#2428) When subagents run in ThreadPoolExecutor threads, the shared stdout handle can close between thread teardown and KawaiiSpinner cleanup. Python raises ValueError (not OSError) for I/O operations on closed files: ValueError: I/O operation on closed file The _SafeWriter class was only catching OSError, missing this case. Changes: - Add ValueError to exception handling in write(), flush(), and isatty() - Update docstring to document the ThreadPoolExecutor teardown scenario Fixes NousResearch#2428
imap.uid('search') can return data=[] when the mailbox is empty or
has no matching messages. Accessing data[0] without checking len first
raises IndexError: list index out of range.
Fixed at both call sites in gateway/platforms/email.py:
- Line 233 (connect): ALL search on startup
- Line 298 (fetch): UNSEEN search in the polling loop
Closes NousResearch#2137
…unt() Both methods accessed self._conn without self._lock, breaking the thread-safety contract documented on SessionDB (line 111). All 22 other DB methods use with self._lock — these two were the only exceptions. In the gateway's multi-threaded environment (multiple platform reader threads + single writer) this could cause cursor interleaving, sqlite3.ProgrammingError, or inconsistent COUNT results. Closes NousResearch#2130
…to prevent prefill rejection
fix: batch of 5 small contributor fixes — PortAudio, SafeWriter, IMAP, thread lock, prefill
* docs: add Gemini OAuth provider implementation plan Planning doc for a standard-route Gemini provider using Google OAuth (Authorization Code + PKCE) with the OpenAI-compatible endpoint at generativelanguage.googleapis.com. Covers OAuth flow, token lifecycle, file list, and estimated scope (~700 lines). Replaces the Node.js bridge approach from PR NousResearch#2042. * chore: update OpenRouter model list - Add xiaomi/mimo-v2-pro - Add nvidia/nemotron-3-super-120b-a12b (paid, higher rate limits) - Remove openrouter/hunter-alpha and openrouter/healer-alpha (discontinued)
Based on PR NousResearch#2427 by @oxngon (core feature extracted, reformatting and unrelated changes dropped). Discord's TYPING_START gateway event is unreliable for bot DMs. This adds a background typing loop that hits POST /channels/{id}/typing every 8 seconds (indicator lasts ~10s) until the response is sent. - send_typing() starts a per-channel background loop (idempotent) - stop_typing() cancels it (called after _run_agent returns) - Base adapter gets stop_typing() as a no-op default - Per-channel tracking via _typing_tasks dict prevents duplicates
Add hermes mcp add/remove/list/test/configure CLI for managing MCP
server connections interactively. Discovery-first 'add' flow connects,
discovers tools, and lets users select which to enable via curses checklist.
Add OAuth 2.1 PKCE authentication for MCP HTTP servers (RFC 7636).
Supports browser-based and manual (headless) authorization, token
caching with 0600 permissions, automatic refresh. Zero external deps.
Add ${ENV_VAR} interpolation in MCP server config values, resolved
from os.environ + ~/.hermes/.env at load time.
Core OAuth module from PR NousResearch#2021 by @imnotdev25. CLI and mcp_tool
wiring rewritten against current main. Closes NousResearch#497, NousResearch#690.
…5d6932ba feat(discord): persistent typing indicator for DMs
Reverts the sanitizer addition from PR NousResearch#2466 (originally NousResearch#2129). We already have _empty_content_retries handling for reasoning-only responses. The trailing strip risks silently eating valid messages and is redundant with existing empty-content handling.
…ch#2471) revert: remove trailing empty assistant message stripping
…sResearch#2472) The /v1/responses endpoint used an in-memory OrderedDict that lost all conversation state on gateway restart. Replace with SQLite-backed storage at ~/.hermes/response_store.db. - Responses and conversation name mappings survive restarts - Same LRU eviction behavior (configurable max_size) - WAL mode for concurrent read performance - Falls back to in-memory SQLite if disk path unavailable - Conversation name→response_id mapping moved into the store
…esearch#2469) * fix: respect DashScope v1 runtime mode for alibaba Remove the hardcoded Alibaba branch from resolve_runtime_provider() that forced api_mode='anthropic_messages' regardless of the base URL. Alibaba now goes through the generic API-key provider path, which auto-detects the protocol from the URL: - /apps/anthropic → anthropic_messages (via endswith check) - /v1 → chat_completions (default) This fixes Alibaba setup with OpenAI-compatible DashScope endpoints (e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because runtime always forced Anthropic mode even when setup saved a /v1 URL. Based on PR NousResearch#2024 by @kshitijk4poor. * docs(skill): add split, merge, search examples to ocr-and-documents skill Adds pymupdf examples for PDF splitting, merging, and text search to the existing ocr-and-documents skill. No new dependencies — pymupdf already covers all three operations natively. * fix: replace all production print() calls with logger in rl_training_tool Replace all bare print() calls in production code paths with proper logger calls. - Add `import logging` and module-level `logger = logging.getLogger(__name__)` - Replace print() in _start_training_run() with logger.info() - Replace print() in _stop_training_run() with logger.info() - Replace print(Warning/Note) calls with logger.warning() and logger.info() Using the logging framework allows log level filtering, proper formatting, and log routing instead of always printing to stdout. * fix(gateway): process /queue'd messages after agent completion /queue stored messages in adapter._pending_messages but never consumed them after normal (non-interrupted) completion. The consumption path at line 5219 only checked pending messages when result.get('interrupted') was True — since /queue deliberately doesn't interrupt, queued messages were silently dropped. Now checks adapter._pending_messages after both interrupted AND normal completion. For queued messages (non-interrupt), the first response is delivered before recursing to process the queued follow-up. Skips the direct send when streaming already delivered the response. Reported by GhostMode on Discord. --------- Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com> Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
…31d7db3b feat(cli): MCP server management CLI + OAuth 2.1 PKCE auth
Follow-up to 669c60a (cherry-pick of PR NousResearch#2187, fixes NousResearch#2177). The original fix emits a "\n\n" delta immediately after every _execute_tool_calls() invocation. When the model runs multiple consecutive tool iterations before producing text (common with search → read → analyze flows), each iteration appends its own paragraph break, resulting in 4-6+ blank lines before the actual response. Replace the immediate delta with a deferred flag (_stream_needs_break). _fire_stream_delta() checks the flag and prepends a single "\n\n" only when the first real text delta arrives, so multiple back-to-back tool iterations still produce exactly one paragraph break.
…ng (NousResearch#2473) fix: defer streaming iteration linebreak to prevent blank line stacking
* fix: respect DashScope v1 runtime mode for alibaba Remove the hardcoded Alibaba branch from resolve_runtime_provider() that forced api_mode='anthropic_messages' regardless of the base URL. Alibaba now goes through the generic API-key provider path, which auto-detects the protocol from the URL: - /apps/anthropic → anthropic_messages (via endswith check) - /v1 → chat_completions (default) This fixes Alibaba setup with OpenAI-compatible DashScope endpoints (e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because runtime always forced Anthropic mode even when setup saved a /v1 URL. Based on PR NousResearch#2024 by @kshitijk4poor. * docs(skill): add split, merge, search examples to ocr-and-documents skill Adds pymupdf examples for PDF splitting, merging, and text search to the existing ocr-and-documents skill. No new dependencies — pymupdf already covers all three operations natively. * fix: replace all production print() calls with logger in rl_training_tool Replace all bare print() calls in production code paths with proper logger calls. - Add `import logging` and module-level `logger = logging.getLogger(__name__)` - Replace print() in _start_training_run() with logger.info() - Replace print() in _stop_training_run() with logger.info() - Replace print(Warning/Note) calls with logger.warning() and logger.info() Using the logging framework allows log level filtering, proper formatting, and log routing instead of always printing to stdout. * fix(gateway): process /queue'd messages after agent completion /queue stored messages in adapter._pending_messages but never consumed them after normal (non-interrupted) completion. The consumption path at line 5219 only checked pending messages when result.get('interrupted') was True — since /queue deliberately doesn't interrupt, queued messages were silently dropped. Now checks adapter._pending_messages after both interrupted AND normal completion. For queued messages (non-interrupt), the first response is delivered before recursing to process the queued follow-up. Skips the direct send when streaming already delivered the response. Reported by GhostMode on Discord. * chore: add minimax/minimax-m2.7 to OpenRouter and MiniMax model catalogs --------- Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com> Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
…ing idle (NousResearch#3398) The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via asyncio.get_running_loop().create_task(). When an AsyncOpenAI client is garbage-collected while prompt_toolkit's event loop is running (the common CLI idle state), the aclose() task runs on prompt_toolkit's loop but the underlying TCP transport is bound to a different (dead) worker loop. The transport's self._loop.call_soon() then raises RuntimeError('Event loop is closed'), which prompt_toolkit surfaces as the disruptive 'Unhandled exception in event loop ... Press ENTER to continue...' error. Three-layer fix: 1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI startup before any AsyncOpenAI clients are created. Safe because cached clients are explicitly cleaned via _force_close_async_httpx, and uncached clients' TCP connections are cleaned by the OS on exit. 2. Custom asyncio exception handler: Installed on prompt_toolkit's event loop to silently suppress 'Event loop is closed' RuntimeError. Defense-in-depth for SDK upgrades that might change the class name. 3. cleanup_stale_async_clients(): Called after each agent turn (when the agent thread joins) to proactively evict cache entries whose event loop is closed, preventing stale clients from accumulating.
NousResearch#3405) Two independent bugs caused the reasoning box to appear three times when the model produced reasoning + tool_calls: Bug A: _build_assistant_message() re-fired reasoning_callback with the full reasoning text even when streaming had already displayed it. The original guard only checked structured reasoning_content deltas, but reasoning also arrives via content tag extraction (<REASONING_SCRATCHPAD>/<think> tags in delta.content), which went through _fire_stream_delta not _fire_reasoning_delta. Fix: skip the callback entirely when streaming is active — both paths display reasoning during the stream. Any reasoning not shown during streaming is caught by the CLI post-response fallback. Bug B: The post-response reasoning display checked _reasoning_stream_started, but that flag was reset by _reset_stream_state() during intermediate turn boundaries (when stream_delta_callback(None) fires between tool calls). Introduced _reasoning_shown_this_turn flag that persists across the tool loop and is only reset at the start of each user turn. Live-tested in PTY: reasoning now shows exactly once per API call, no duplicates across tool-calling loops.
…ess (NousResearch#3274) * refactor: suffix runtimeDeps PATH so apt-installed tools take priority Changes makeWrapper from --prefix to --suffix. In container mode, tools installed via apt in /usr/bin now win over read-only nix store copies. Nix store versions become dead-letter fallbacks. Native NixOS mode unaffected — tools in /run/current-system/sw/bin already precede the suffix. * feat(container): first-boot apt provisioning for agent tools Installs nodejs, npm, curl via apt and uv via curl on first container boot. Uses sentinel file so subsequent boots skip. Container recreation triggers fresh install. Combined with --suffix PATH change, agents get mutable tools that support npm i -g and uv without hitting read-only nix store paths. * docs: update nixosModules header for tool provisioning * feat(container): consolidate first-boot provisioning + Python 3.11 venv Merge sudo and tool apt installs into a single apt-get update call. Move uv install outside the sentinel so transient failures retry on next boot. Bootstrap a Python 3.11 venv via uv (--seed for pip) and prepend ~/.venv/bin to PATH so agents get writable python/pip/node out of the box. --------- Co-authored-by: Hermes Agent <hermes@nousresearch.com>
NousResearch#3366) (NousResearch#3421) Two-layer caching for build_skills_system_prompt(): 1. In-process LRU (OrderedDict, max 8) — same-process: 546ms → <1ms 2. Disk snapshot (.skills_prompt_snapshot.json) — cold start: 297ms → 103ms Key improvements over original PR NousResearch#3366: - Extract shared logic into agent/skill_utils.py (parse_frontmatter, skill_matches_platform, get_disabled_skill_names, extract_skill_conditions, extract_skill_description, iter_skill_index_files) - tools/skills_tool.py delegates to shared module — zero code duplication - Proper LRU eviction via OrderedDict.move_to_end + popitem(last=False) - Cache invalidation on all skill mutation paths: - skill_manage tool (in-conversation writes) - hermes skills install (CLI hub) - hermes skills uninstall (CLI hub) - Automatic via mtime/size manifest on cold start prompt_builder.py no longer imports tools.skills_tool (avoids pulling in the entire tool registry chain at prompt build time). 6301 tests pass, 0 failures. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
…nect (salvage NousResearch#3399) (NousResearch#3427) Salvage of NousResearch#3399 by @binhnt92 with true agent interruption added on top. When a streaming /v1/chat/completions client disconnects mid-stream, the agent is now interrupted via agent.interrupt() so it stops making LLM API calls, and the asyncio task wrapper is cancelled. Closes NousResearch#3399.
…rch#3419) Salvage of PR NousResearch#1747 (original PR NousResearch#1171 by @davanstrien) onto current main. Registers Hugging Face Inference Providers (router.huggingface.co/v1) as a named provider: - hermes chat --provider huggingface (or --provider hf) - 18 curated open models via hermes model picker - HF_TOKEN in ~/.hermes/.env - OpenAI-compatible endpoint with automatic failover (Groq, Together, SambaNova, etc.) Files: auth.py, models.py, main.py, setup.py, config.py, model_metadata.py, .env.example, 5 docs pages, 17 new tests. Co-authored-by: Daniel van Strien <davanstrien@gmail.com>
…g models (NousResearch#3431) Models like GLM-5/5.1 can think for 15+ minutes. The previous 900s (15 min) default for HERMES_API_TIMEOUT killed legitimate requests. Raised to 1800s (30 min) in both places that read the env var: - _build_api_kwargs() timeout (non-streaming total timeout) - _call_chat_completions() write timeout (streaming connection) The streaming per-chunk read timeout (60s) and stale stream detector (180-300s) are unchanged — those are appropriate for inter-chunk timing.
…6K (NousResearch#3426) The Anthropic adapter defaulted to max_tokens=16384 when no explicit value was configured. This severely limits thinking-enabled models where thinking tokens count toward max_tokens: - Claude Opus 4.6 supports 128K output but was capped at 16K - Claude Sonnet 4.6 supports 64K output but was capped at 16K With extended thinking (adaptive or budget-based), the model could exhaust the entire 16K on reasoning, leaving zero tokens for the actual response. This caused two user-visible errors: - 'Response truncated (finish_reason=length)' — thinking consumed most tokens - 'Response only contains think block with no content' — thinking consumed all Fix: add _ANTHROPIC_OUTPUT_LIMITS lookup table (sourced from Anthropic docs and Cline's model catalog) and use the model's actual output limit as the default. Unknown future models default to 128K (the current maximum). Also adds context_length clamping: if the user configured a smaller context window (e.g. custom endpoint), max_tokens is clamped to context_length - 1 to avoid exceeding the window. Closes NousResearch#2706
…ousResearch#3428) Previously, tirith exit code 1 (block) immediately rejected the command with no approval prompt — users saw 'BLOCKED: Command blocked by security scan' and the agent moved on. This prevented gateway/CLI users from approving pipe-to-shell installs like 'curl ... | sh' even when they understood the risk. Changes: - Tirith 'block' and 'warn' now both go through the approval flow. Users see the full tirith findings (severity, title, description, safer alternatives) and can choose to approve or deny. - New _format_tirith_description() builds rich descriptions from tirith findings JSON so the approval prompt is informative. - CLI startup now warns when tirith is enabled but not available, so users know command scanning is degraded to pattern matching only. The default approval choice is still deny, so the security posture is unchanged for unattended/timeout scenarios. Reported via Discord by pistrie — 'curl -fsSL https://mandex.dev/install.sh | sh' was hard-blocked with no way to approve.
…3440) Show only agentic models that map to OpenRouter defaults: Qwen/Qwen3.5-397B-A17B ↔ qwen/qwen3.5-plus Qwen/Qwen3.5-35B-A3B ↔ qwen/qwen3.5-35b-a3b deepseek-ai/DeepSeek-V3.2 ↔ deepseek/deepseek-chat moonshotai/Kimi-K2.5 ↔ moonshotai/kimi-k2.5 MiniMaxAI/MiniMax-M2.5 ↔ minimax/minimax-m2.5 zai-org/GLM-5 ↔ z-ai/glm-5 XiaomiMiMo/MiMo-V2-Flash ↔ xiaomi/mimo-v2-pro moonshotai/Kimi-K2-Thinking ↔ moonshotai/kimi-k2-thinking Users can still pick any HF model via Enter custom model name.
…retry (salvage NousResearch#3389) (NousResearch#3449) Salvage of NousResearch#3389 by @binhnt92 with reasoning fallback and retry logic added on top. All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty. Closes NousResearch#3389.
…less retries (NousResearch#3444) When finish_reason='length' and the response contains only reasoning (think blocks or empty content), the model exhausted its output token budget on thinking with nothing left for the actual response. Previously, this fell into either: - chat_completions: 3 useless continuation retries (model hits same limit) - anthropic/codex: generic 'Response truncated' error with rollback Now: detect the think-only + length condition early and return immediately with a targeted error message: 'Model used all output tokens on reasoning with none left for the response. Try lowering reasoning effort or increasing max_tokens.' This saves 2 wasted API calls on the chat_completions path and gives users actionable guidance instead of a cryptic error. The existing think-only retry logic (finish_reason='stop') is unchanged — that's a genuine model glitch where retrying can help.
…ousResearch#3457) hermes tools and _get_platform_tools() call get_plugin_toolsets() / _get_plugin_toolset_keys() without first ensuring plugins have been discovered. discover_plugins() only runs as a side effect of importing model_tools.py, which hermes tools never does. This means: - hermes tools TUI never shows plugin toolsets (invisible to users) - _get_platform_tools() in standalone processes misses plugin toolsets Fix: call discover_plugins() (idempotent) in both _get_plugin_toolset_keys() and _get_effective_configurable_toolsets() before accessing plugin state. In the gateway/CLI where model_tools.py is already imported, the call is a no-op (discover_and_load checks _discovered flag).
…rch#3469) _expand_git_reference() and _rg_files() called subprocess.run() without a timeout. On a large repository, @diff, @StaGeD, or @git:N references could hang the agent indefinitely while git or ripgrep processes slow output. - Add timeout=30 to git subprocess in _expand_git_reference() with a user-friendly error message on TimeoutExpired - Add timeout=10 to rg subprocess in _rg_files() returning None on timeout (falls back to os.walk folder listing) Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
…#3473) Matrix platform was missing from the PLATFORMS config, causing a KeyError in _get_platform_tools() when handling Matrix messages. Every other platform (telegram, discord, slack, etc.) was present but matrix was overlooked. Co-authored-by: williamtwomey <williamtwomey@users.noreply.github.com>
…NousResearch#3413) (NousResearch#3478) * Fix NousResearch#3409: Add fallback to session_search to prevent false negatives on summarization failure Fixes NousResearch#3409. When the auxiliary summarizer fails or returns None, the tool now returns a raw fallback preview of the matched session instead of silently dropping it and returning an empty list * fix: clean up fallback logic — separate exception handling from preview Restructure the loop: handle exceptions first (log + nullify), build entry dict once, then branch on result truthiness. Removes duplicated field assignments and makes the control flow linear. --------- Co-authored-by: devorun <130918800+devorun@users.noreply.github.com>
…#3480) * fix: cap context pressure percentage at 100% in display The forward-looking token estimate can overshoot the compaction threshold (e.g. a large tool result pushes it from 70% to 109% in one step). The progress bar was already capped via min(), but pct_int was not — causing the user to see '109% to compaction' which is confusing. Cap pct_int at 100 in both CLI and gateway display functions. Reported by @JoshExile82. * refactor: use real API token counts for compression decisions Replace the rough chars/3 estimation with actual prompt_tokens + completion_tokens from the API response. The estimation was needed to predict whether tool results would push context past the threshold, but the default 50% threshold leaves ample headroom — if tool results push past it, the next API call reports real usage and triggers compression then. This removes all estimation from the compression and context pressure paths, making both 100% data-driven from provider-reported token counts. Also removes the dead _msg_count_before_tools variable.
) - Change default inference_base_url from dashscope-intl Anthropic-compat endpoint to coding-intl OpenAI-compat /v1 endpoint. The old Anthropic endpoint 404'd when used with the OpenAI SDK (which appends /chat/completions to a /apps/anthropic base URL). - Update curated model list: remove models unavailable on coding-intl (qwen3-max, qwen-plus-latest, qwen3.5-flash, qwen-vl-max), add third-party models available on the platform (glm-5, glm-4.7, kimi-k2.5, MiniMax-M2.5). - URL-based api_mode auto-detection still works: overriding DASHSCOPE_BASE_URL to an /apps/anthropic endpoint automatically switches to anthropic_messages mode. - Update provider description and env var descriptions to reflect the coding-intl multi-provider platform. - Update tests to match new default URL and test the anthropic override path instead.
…ousResearch#3414) (NousResearch#3488) * test(gateway): map fixture adapter by platform in progress threading tests * fix(gateway): scope progress thread fallback to Slack only --------- Co-authored-by: EmpireOperating <258363005+EmpireOperating@users.noreply.github.com>
…arch#3490) EmailAdapter._seen_uids accumulates every IMAP UID ever seen but never removes any. A long-running gateway processing a high-volume inbox would leak memory indefinitely — thousands of integers per day. IMAP UIDs are monotonically increasing integers, so old UIDs are safe to drop: new messages always have higher UIDs, and the IMAP UNSEEN flag already prevents re-delivery regardless of our local tracking. Fix adds _trim_seen_uids() which keeps only the most recent 1000 UIDs (half of the 2000-entry cap) when the set grows too large. Called automatically during connect() and after each fetch cycle. Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
…, and gateway edge cases (salvage NousResearch#3489) (NousResearch#3492) * fix: harden `hermes update` against diverged history, non-main branches, and gateway edge cases The self-update command (`hermes update` / gateway `/update`) could fail or silently corrupt state in several scenarios: 1. **Diverged history** — `git pull --ff-only` aborts with a cryptic subprocess error when upstream has force-pushed or rebased. Now falls back to `git reset --hard origin/main` since local changes are already stashed. 2. **User on a feature branch / detached HEAD** — the old code would either clobber the feature branch HEAD to point at origin/main, or silently pull against a non-existent remote branch. Now auto-checkouts main before pulling, with a clear warning. 3. **Fetch failures** — network or auth errors produced raw subprocess tracebacks. Now shows user-friendly messages ("Network error", "Authentication failed") with actionable hints. 4. **reset --hard failure** — if the fallback reset itself fails (disk full, permissions), the old code would still attempt stash restore on a broken working tree. Now skips restore and tells the user their changes are safe in stash. 5. **Gateway /update stash conflicts** — non-interactive mode (Telegram `/update`) called sys.exit(1) when stash restore had conflicts, making the entire update report as failed even though the code update itself succeeded. Now treats stash conflicts as non-fatal in non-interactive mode (returns False instead of exiting). * fix: restore stash and branch on 'already up to date' early return The PR moved stash creation before the commit-count check (needed for the branch-switching feature), but the 'already up to date' early return didn't restore the stash or switch back to the original branch — leaving the user stranded on main with changes trapped in a stash. Now the early-return path restores the stash and checks out the original branch when applicable. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
|
8468eff to
13318d9
Compare
|
What does this PR do?
Adds the internal xgate provider surface and the first provider-neutral MPP runtime scaffold for paid endpoints. Hermes now propagates
payment_adapter/payment_configthrough CLI, gateway, cron, smart routing, and delegation, andAIAgentcan handle an MPP-style402 -> credential retry -> session reuseloop.Related Issue
Fixes #
Type of Change
Changes Made
hermes_cli/*andtests/test_api_key_providers.py.agent/payments/*with MPP challenge, credential, receipt, and in-memory session primitives.cli.py,gateway/run.py,cron/scheduler.py,agent/smart_model_routing.py,hermes_cli/runtime_provider.py, andtools/delegate_tool.py.run_agent.pysupport for MPP payment retry, session reuse, and receipt-driven session updates.tests/test_cli_provider_resolution.py,tests/test_payment_runtime_provider.py,tests/gateway/test_runtime_agent_kwargs.py,tests/tools/test_delegate.py,tests/tools/test_delegate_payment_runtime.py,tests/agent/test_smart_model_routing.py, andtests/test_run_agent.py.How to Test
source .venv/bin/activatepython -m pytest -o addopts='' tests/test_payment_runtime_provider.py tests/agent/test_smart_model_routing.py tests/gateway/test_runtime_agent_kwargs.py tests/tools/test_delegate.py tests/tools/test_delegate_payment_runtime.py tests/test_cli_provider_resolution.py tests/test_run_agent.py -qChecklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AScreenshots / Logs
Focused verification passed:
python -m pytest -o addopts='' tests/test_payment_runtime_provider.py tests/agent/test_smart_model_routing.py tests/gateway/test_runtime_agent_kwargs.py tests/tools/test_delegate.py tests/tools/test_delegate_payment_runtime.py tests/test_cli_provider_resolution.py tests/test_run_agent.py -q280 passed, 1 warning